A semantically confidence-weighted ITG induction algorithm
نویسندگان
چکیده
We propose a new algorithm to induce inversion transduction grammars, in which a crosslingual semantic frame based objective function is injected as confidence weighting in the early stages of statistical machine translation training. Unlike recent work on improving translation adequacy that uses a monolingual semantic frame based objective function to drive the tuning of loglinear mixture weights in the late stages of statistical machine translation training, our bilingual approach incorporates the semantic objective during the actual learning of the translation model’s structure. Our approach assigns higher confidence to training examples in which the semantic frames in the input language more closely match the semantic frames of the output language, as predicted automatically by XMEANT, the crosslingual semantic frame based machine translation evaluation metric. We chose to apply this approach to induce inversion transduction grammars (ITGs), since ITG alignments prune a large majority of the space of possible alignments, while at the same time empirically fully covering all the crosslingual semantic frame alternations of the type we are using for confidence weighting. Results show that boosting semantically compatible training examples in ITG induction improves Copyright c © by the paper’s authors. Copying permitted for private and academic purposes. In Proceedings of 3rd International Workshop on Semantic Machine Learning (SML 2016), 10th July 2016, New York City, NY, USA. the translation performance compared to either traditional GIZA++ alignment or conventional ITG alignment based approaches for phrase based statistical machine translation.
منابع مشابه
Driving inversion transduction grammar induction with semantic evaluation
We describe a new technique for improving statistical machine translation training by adopting scores from a recent crosslingual semantic frame based evaluation metric, XMEANT, as outside probabilities in expectation-maximization based ITG (inversion transduction grammars) alignment. Our new approach strongly biases early-stage SMT learning towards semantically valid alignments. Unlike previous...
متن کاملImproving word alignment for low resource languages using English monolingual SRL
We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focu...
متن کاملHierarchical Translation Equivalence over Word Alignments
We present a theory of word alignments in machine translation (MT) that equips every word alignment with a hierarchical representation with exact semantics defined over the translation equivalence relations known as hierarchical phrase pairs. The hierarchical representation consists of a set of synchronous trees (called Hierarchical Alignment Trees – HATs), each specifying a bilingual compositi...
متن کاملITG: A New Global GNSS Tropospheric Correction Model
Tropospheric correction models are receiving increasing attentions, as they play a crucial role in Global Navigation Satellite System (GNSS). Most commonly used models to date include the GPT2 series and the TropGrid2. In this study, we analyzed the advantages and disadvantages of existing models and developed a new model called the Improved Tropospheric Grid (ITG). ITG considers annual, semi-a...
متن کاملConfidence Interval Estimation of the Mean of Stationary Stochastic Processes: a Comparison of Batch Means and Weighted Batch Means Approach (TECHNICAL NOTE)
Suppose that we have one run of n observations of a stochastic process by means of computer simulation and would like to construct a condifence interval for the steady-state mean of the process. Seeking for independent observations, so that the classical statistical methods could be applied, we can divide the n observations into k batches of length m (n= k.m) or alternatively, transform the cor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016